Species tree estimation using ASTRAL: how many genes are enough?

نویسندگان

  • Shubhanshu Shekhar
  • Sébastien Roch
  • Siavash Mirarab
چکیده

Species tree reconstruction from genomic data is increasingly performed using methods that account for sources of gene tree discordance such as incomplete lineage sorting. One popular method for reconstructing species trees from unrooted gene tree topologies is ASTRAL. In this paper, we derive theoretical sample complexity results for the number of genes required by ASTRAL to guarantee reconstruction of the correct species tree with high probability. We also validate those theoretical bounds in a simulation study. Our results indicate that ASTRAL requires O(f-2logn) gene trees to reconstruct the species tree correctly with high probability where n is the number of species and f is the length of the shortest branch in the species tree. Our simulations, which are the first to test ASTRAL explicitly under the anomaly zone, show trends consistent with the theoretical bounds and also provide some practical insights on the conditions where ASTRAL works well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASTRAL: genome-scale coalescent-based species tree estimation

MOTIVATION Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and speci...

متن کامل

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

MOTIVATION The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods o...

متن کامل

The Impact of Missing Data on Species Tree Estimation.

Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree es...

متن کامل

Supplementary Material to ASTRAL: Genome-Scale Coalescent-Based Species Tree Estimation

3 Experimental Details 16 3.1 Extra trees for Zhong et al. biological dataset . . . . . . . . . 16 3.2 Methods and Commands . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Gene tree estimation . . . . . . . . . . . . . . . . . . . 16 3.2.2 ASTRAL . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.3 BUCKy-pop . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.4 MRP and MRL . . . . ....

متن کامل

Determining Difference in Evolutionary Variation of Bacterial RecA proteins vs 16SrRNA Genes by using 16s_Toxonomy Tree

Background and Aims: The rate of variation in various genes of a bacterial species is different during evolution. Therefore, in systematic bacterial studies many researchers compare the phylogenetic tree of a particular gene to the standard tree of an rRNA gene. Regarding the importance of 16SrRNA gene and the evolutional process of RecA protein family, we investigated the changes in the select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE/ACM transactions on computational biology and bioinformatics

دوره   شماره 

صفحات  -

تاریخ انتشار 2017